Warning: file_put_contents(aCache/aDaily/post/opendatascience/-2330-2331-): Failed to open stream: No space left on device in /var/www/tg-me/post.php on line 50
Data Science by ODS.ai 🦜 | Telegram Webview: opendatascience/2331 -
Telegram Group & Telegram Channel
βš™οΈ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub, designed for training and evaluation of LLMs in software engineering.

Main features of the system:
1️⃣ Automatic data collection: Continuously extracts issue-PR pairs from Python repositories.
2️⃣ LLM-based environment setup: LLM analyzes repositories, creates install instructions, and updates them if errors happen.
3️⃣ Execution-based validation: Each task is tested by automatic setup, test run, and dependency freezing to make it reproducible.
4️⃣ LLM quality annotation: Tasks are labeled for clarity, difficulty, and test correctness to support filtering.

Result:
SWE-rebench dataset: 21,000+ ready-to-use interactive tasks.
Continuous updates: Fresh data is added regularly.
Transparent evaluation: Tasks are used for public SWE-rebench leaderboard.

πŸš€ SWE-rebench gives researchers and developers real and validated tasks to work with LLMs in SWE field.

Technical report: arXiv
Dataset: SWE-rebench



tg-me.com/opendatascience/2331
Create:
Last Update:

βš™οΈ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub, designed for training and evaluation of LLMs in software engineering.

Main features of the system:
1️⃣ Automatic data collection: Continuously extracts issue-PR pairs from Python repositories.
2️⃣ LLM-based environment setup: LLM analyzes repositories, creates install instructions, and updates them if errors happen.
3️⃣ Execution-based validation: Each task is tested by automatic setup, test run, and dependency freezing to make it reproducible.
4️⃣ LLM quality annotation: Tasks are labeled for clarity, difficulty, and test correctness to support filtering.

Result:
SWE-rebench dataset: 21,000+ ready-to-use interactive tasks.
Continuous updates: Fresh data is added regularly.
Transparent evaluation: Tasks are used for public SWE-rebench leaderboard.

πŸš€ SWE-rebench gives researchers and developers real and validated tasks to work with LLMs in SWE field.

Technical report: arXiv
Dataset: SWE-rebench

BY Data Science by ODS.ai 🦜





Share with your friend now:
tg-me.com/opendatascience/2331

View MORE
Open in Telegram


Data Science by ODS ai 🦜 Telegram | DID YOU KNOW?

Date: |

The lead from Wall Street offers little clarity as the major averages opened lower on Friday and then bounced back and forth across the unchanged line, finally finishing mixed and little changed.The Dow added 33.18 points or 0.10 percent to finish at 34,798.00, while the NASDAQ eased 4.54 points or 0.03 percent to close at 15,047.70 and the S&P 500 rose 6.50 points or 0.15 percent to end at 4,455.48. For the week, the Dow rose 0.6 percent, the NASDAQ added 0.1 percent and the S&P gained 0.5 percent.The lackluster performance on Wall Street came on uncertainty about the outlook for the markets following recent volatility.

Data Science by ODS ai 🦜 from jp


Telegram Data Science by ODS.ai 🦜
FROM USA